Inserting A Compute Filter

Up till now, what we used was vertex and pixel shader stages. However, there are many other stages available to create effects ! This tutorial will focus on how using the compute shader stage to take advantage of it and add a sepia filter on top of our rendering.

While this kind of effect can also be achieved through a PostProcessPass, using compute capabilities can help in leveraging full hardware performances. For instance using Dx12, in the long term, it will be possible to use the async compute functionality of the hardware. This tutorial builds on top of preceding tutorial, so be sure you already have an understanding on what has been done. Let's not wait more and dig into the subject !

Preparing the resources

Compute programs behave the same as any other program when reading data provided by the application. It is possible to give it a ConstantBuffer, a Texture, and so on. However, writing to a resource is slightly different.

In the compute stage, no target can be written to. In fact, nothing is rasterized, and as such, compute programs usually write manually to buffers that can be later interpreted by the application. In graphics API terms, this is an Unordered Access View (UAV) resource, which can be both read from and written to.

Within nkGraphics, we will use resources from the Buffer class. These buffers can hold binary data that can be freely interpreted by the application. They can also be prepared to be read and written from programs, which makes them the way to exchange data with compute shaders.

Let's first see how we can create such a resource. Includes are :

#include <NilkinsGraphics/Buffers/Buffer.h> #include <NilkinsGraphics/Buffers/BufferManager.h>

Then, we can create the buffer :

nkGraphics::Buffer* buffer = nkGraphics::BufferManager::getInstance()->createOrRetrieve("filterBuffer") ;

Like usual, we use the dedicated manager to allocate the resource, and keep track of it. Next step will be to set the buffer up. First, we will specify what kind of usage we will use it for :

buffer->prepareForComputeResourceUsage() ; buffer->prepareForShaderResourceUsage() ;

The buffer will be used both for compute usage (aka writing, UAV), and shader resource usage (standard reading, like a texture). These two calls will allow the component to properly setup the buffer for what it's meant for. Next step, we need to setup the size of the buffer to use.

// We will consider it as an array of pixels of 4 bytes (R8G8B8A8 -> 32 bits) buffer->setElementByteSize(4) ; // Our image is 800x600 pixels buffer->setElementCount(800 * 600) ;

Here we will think in terms of image, as this program will be a filter. It will be processing pixels, over a 800x600 image. As such, we first set an element's byte size, corresponding to a pixel, depending on its format. Finally, the number of elements can be given.

buffer->load() ;

To end this step, we trigger the loading of the buffer to make it allocate all rendering resources necessary.

Once the buffer is ready, we can go to next resource required : the texture that will receive the temporary rendering for the filter to read from. This texture will be a render target that uses the size of the window, and will be referenced as the color target in a TargetOperations we will detail later.

nkGraphics::Texture* tex = nkGraphics::TextureManager::getInstance()->createOrRetrieve("sceneTarget") ; // First, set up its size to the window size tex->setWidth(800) ; tex->setHeight(600) ; // Then it requires its format tex->setTextureFormat(nkGraphics::R8G8B8A8_UNORM) ; // A texture that can be used as a render target needs to be marked as such tex->setRenderFlag(nkGraphics::TEX_RENDER_FLAG::RENDER_TARGET) ; // We can request the load, everything is setup tex->load() ;

We require a little bit more setup than simply loading the texture from a file. As it is manually created, we need to specify some info so that nkGraphics knows what to create. We set its size, aligned on the window. Next it to ensure the format used is the one we expect within the rendering pipeline. An important step is to flag when textures are supposed to be render targets. This is true also for targets used for depth. Once all is setup, we load the texture, and make it ready to be used.

Setting up the new shaders

Once the resources are ready, we will need to setup our new effect. We will require 2 new shaders : a compute one processing the data, and a post process one to copy back the data into our rendering surface.

Let's begin by setting up the compute shader. First, it will require its program :

nkGraphics::Program* filterProgram = nkGraphics::ProgramManager::getInstance()->createOrRetrieve("filterProgram") ; nkGraphics::ProgramSourcesHolder sources ; // This program will only use compute stage sources.setComputeMemory ( R"eos( cbuffer passConstants { uint4 texInfos ; } RWStructuredBuffer<uint> bufOut : register(u0) ; Texture2D inTex : register(t0) ; [numthreads(32, 32, 1)] void main (int3 dispatchThreadID : SV_DispatchThreadID) { if (dispatchThreadID.x < texInfos.x && dispatchThreadID.y < texInfos.y) { float4 texCol = inTex.Load(uint3(dispatchThreadID.xy, 0)) ; const float3x3 sepiaMask = { 0.393, 0.349, 0.272, 0.769, 0.686, 0.534, 0.189, 0.168, 0.131 } ; texCol.rgb = saturate(mul(texCol.rgb, sepiaMask)) ; uint texColPacked = 0 ; texColPacked += ((uint)(texCol.r * 255)) << 24 ; texColPacked += ((uint)(texCol.g * 255)) << 16 ; texColPacked += ((uint)(texCol.b * 255)) << 8 ; texColPacked += 0xFF ; int index = dispatchThreadID.x + dispatchThreadID.y * texInfos.x ; bufOut[index] = texColPacked ; } } )eos" ) ; filterProgram->setFromMemory(sources) ; filterProgram->load() ;

We create the program through the manager, and prepare its source before triggering the load. The sources are this time only feeding one stage : the compute one. The program will sort out what it needs to do when loading itself, based on the sources provided.

The program here is pretty straightforward, once we get past the new syntax introduced for compute shading. We have a constant buffer, which will receive target information so that we know which pixels we have to process. Then, we get the structure describing how we will store the data we compute.

This data is accessed through the RWStructuredBuffer primitive, registered to a uav slot. This structure allows to read and write to the buffer attached. This buffer will be formed by unsigned integers in which we will store our colors, packed as R8G8B8A8.

Next comes the texture we will read from, and the program main function. The line just before means we will spawn 32x32x1 threads groups. Let's keep that in mind for later as this will control the number of groups we will want to spawn.

Idea for the main function is that one thread will process one pixel, so we need to find back, from the thread's IDs, which pixel it will correspond to. It first checks whether our thread ID is still within the texture's boundary. We would not want to touch a value outside of the buffer. After the check, it will load the color and apply the sepia filter.

Final step is probably one we could avoid by having floats rather than compacting the data, but I found out it was interesting both for the reader and the writer (me :D) to take a look at the data packing. We use bit-wise operations to shift data around and get a representation on a uint of a pixel, each channel being on its own byte. We can then store the value in the buffer, at the index we found out for a given thread.

Now that the program is ready, we need to create the shader that will use it.

nkGraphics::Shader* filterShader = nkGraphics::ShaderManager::getInstance()->createOrRetrieve("filterShader") ; nkGraphics::Program* filterProgram = nkGraphics::ProgramManager::getInstance()->get("filterProgram") ; filterShader->setProgram(filterProgram) ; // Constant buffer only needs the size of the target used nkGraphics::ConstantBuffer* cBuffer = filterShader->addConstantBuffer(0) ; // We will find it through our offscreen texture nkGraphics::Texture* tex = nkGraphics::TextureManager::getInstance()->get("sceneTarget") ; nkGraphics::ShaderPassMemorySlot* slot = cBuffer->addPassMemorySlot() ; slot->setAsTextureSize(tex) ; // Prepare for texture filterShader->addTexture(tex, 0) ; // Finally, we need to bind our buffer to the UAV slots // An UAV slot allows for writing into it nkGraphics::Buffer* buffer = nkGraphics::BufferManager::getInstance()->get("filterBuffer") ; filterShader->addUavBuffer(buffer, 0) ; // Finalize loading filterShader->load() ;

Story is the same as usual. Create the shader, attach its program and then specify how resources will be fed. While this takes back a lot of what we've done up till now, you might notice a new way of specifying the buffer, via the addUavBuffer call. This function will add a buffer supposed to be written to. From an HLSL point of view, these resources will be linked to the UAV slots, which is where our RWStructuredBuffer is.

This covers the filter shader. Now we need another step in our shader chain, the last one copying back the buffer into the final rendering surface. Let's see what its program looks like :

// This program will copy data from the buffer filled from our compute program and paste in unto the screen nkGraphics::Program* bufferCopyProgram = nkGraphics::ProgramManager::getInstance()->createOrRetrieve("bufferCopyProgram") ; nkGraphics::ProgramSourcesHolder sources ; sources.setVertexMemory ( R"eos( struct VertexInput { float4 position : POSITION ; float2 uvs : TEXCOORD0 ; } ; struct PixelInput { float4 position : SV_POSITION ; float2 uvs : TEXCOORD0 ; uint4 texInfos : TEXINFOS ; } ; cbuffer passConstants { uint4 texInfos ; } PixelInput main (VertexInput input) { PixelInput result ; result.position = input.position ; result.uvs = input.uvs ; result.texInfos = texInfos ; return result ; } )eos" ) ; sources.setPixelMemory ( R"eos( struct PixelInput { float4 position : SV_POSITION ; float2 uvs : TEXCOORD0 ; uint4 texInfos : TEXINFOS ; } ; StructuredBuffer<uint> inputBuf : register(t0) ; float4 main (PixelInput input) : SV_TARGET { uint2 index = uint2(input.uvs.x * input.texInfos.x, input.uvs.y * input.texInfos.y) ; uint texColPacked = inputBuf[index.x + index.y * input.texInfos.x] ; uint texColR = (texColPacked & 0xFF000000) >> 24 ; uint texColG = (texColPacked & 0x00FF0000) >> 16 ; uint texColB = (texColPacked & 0x0000FF00) >> 8 ; float3 texCol = float3(texColR / 255.0, texColG / 255.0, texColB / 255.0) ; return float4(texCol, 1.0) ; } )eos" ) ; bufferCopyProgram->setFromMemory(sources) ; bufferCopyProgram->load() ;

This program will be used as a post process shader. As such, the vertex stage is simply passing variables to the pixel stage. Then, we basically do the opposite of the compute shader : try to match a pixel to a thread index. We unpack the color, and paste it to the render target !

Leaving us with the shader to create :

nkGraphics::Shader* bufferCopyShader = nkGraphics::ShaderManager::getInstance()->createOrRetrieve("bufferCopyShader") ; nkGraphics::Program* bufferCopyProgram = nkGraphics::ProgramManager::getInstance()->get("bufferCopyProgram") ; bufferCopyShader->setProgram(bufferCopyProgram) ; // Constant buffer only needs the size of the target used nkGraphics::ConstantBuffer* cBuffer = bufferCopyShader->addConstantBuffer(0) ; // Target will drive this nkGraphics::ShaderPassMemorySlot* slot = cBuffer->addPassMemorySlot() ; slot->setAsTargetSize() ; // Then we need to bind our buffer to read it from // This time, API-wise, it is considered to be a texture, as we will only read from it nkGraphics::Buffer* buffer = nkGraphics::BufferManager::getInstance()->get("filterBuffer") ; bufferCopyShader->addTexture(buffer, 0) ; // Finalize loading bufferCopyShader->load() ;

There is nothing really new here, apart from the way the buffer is fed to the program. We now feed it as a texture, so it will be bound to a texture slot. This is because our pixel stage only needs to read the buffer. As such, having it bound as a texture resource is a good way to access it.

And this justifies why we requested the buffer to be prepared both for compute (UAV) and shader resource (texture) work. This way, we ensured everything was ready for us to use it in both cases.

And with that, we have all the building blocks we need to prepare our compositor.

Updating the compositor

Our compositor will need to be changed a bit here. The idea will be to :

  1. TargetOperations : render to the sceneTex
    1. ClearTargetPass : clear the target
    2. RenderScenePass : render the sphere
    3. PostProcessPass : render the background
    4. ComputePass : run the filter
  2. TargetOperations : render to the context surface
    1. PostProcessPass : copy the filtered buffer into the target

Which results in a code like :

nkGraphics::Compositor* compositor = nkGraphics::CompositorManager::getInstance()->createOrRetrieve("compositor") ; nkGraphics::CompositorNode* node = compositor->addNode() ; // First operation will render offscreen, but still use the context's depth buffer (no need for another) nkGraphics::TargetOperations* targetOp0 = node->addOperations() ; targetOp0->addColorTarget(sceneTarget) ; targetOp0->setToChainDepthBuffer(true) ; // Unroll our passes nkGraphics::ClearTargetsPass* clearPass = targetOp0->addClearTargetsPass() ; nkGraphics::RenderScenePass* scenePass = targetOp0->addRenderScenePass() ; nkGraphics::PostProcessPass* postProcessPass = targetOp0->addPostProcessPass() ; postProcessPass->setBackProcess(true) ; postProcessPass->setShader(envShader) ; nkGraphics::ComputePass* computePass = targetOp0->addComputePass() ; computePass->setShader(filterShader) ; computePass->setX(25) ; computePass->setY(19) ; // This one will only be responsible for copying the buffer to the rendering surface nkGraphics::TargetOperations* targetOp1 = node->addOperations() ; targetOp1->setToBackBuffer(true) ; nkGraphics::PostProcessPass* copyPass = targetOp1->addPostProcessPass() ; copyPass->setBackProcess(false) ; copyPass->setShader(bufferCopyShader) ;

First, get the compositor, add a node inside, and declare the first target operation, which will render to our render target we created earlier. Passes are like the ones described above, clear, render scene, post process, and compute. The compute pass will run the filter shader we declared earlier.

One important bit is to remember that our shader will spawn groups of 32x32x1 threads. We need to compute 800x600 pixels, which means that on the X axis we need at least 25x32 threads, while one the Y axis we need at least 19x32 threads. This is the purpose of setting the X and Y of the pass, controlling how many groups we need to spawn on both axis.

Do note that if the texture was of different size, this means we would need to spawn less or more threads. Finding the balance between the number of threads per group and number of group will often depend on the problem and the load put by the shader on the hardware. With nkGraphics, you have all the openings to try different setup and see what is more efficient for you.

Final step, we setup a new target to render to, and copy the buffer inside. That's it for our compositor !

If we now launch the program, we should now witness a new feel for our rendering :

Result
The screen is now sepia-filtered ! So melancholic...

Conclusion

And with this, we now witnessed a bit more how compositors can be leveraged to augment the rendering, along with a new pass type : the compute pass. Many things can be run in this general purpose stage, like the sepia filter we did now. However, it can be used for many other purposes : generating a density field for your marching cube algorithm, intersection / visibility checks... You now have an idea on how that can be leveraged within nkGraphics.

This tutorial is now done, I hope you enjoyed it. But nkGraphics having more capabilities, this is not the end of the tutorial series... Hang on tight for the next ones !